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Abstract. We investigate the properties of uniform doubly stochastic random matrices, that is 
non- negative matrices conditioned to have their rows and columns sum to 1. The rescaled marginal 
distributions are shown to converge to exponential distributions and indeed even large sub-matrices 
of side- length o(n 1//2_E ) behave like independent exponentials. We determine the limiting empirical 
distribution of the singular values the the matrix. Finally the mixing time of the associated Markov 
chains is shown to be exactly 2 with high probability. 



Random matrices have become a central area of focus for modern probability theory and numer- 
ous models have been intensely studied including Wigner, Wishart, GOE and GUE matrices [3]. In 
this paper we study a model for which much less is known, namely uniformly chosen entries of the 
set of doubly stochastic matrices (called Uniformly Distributed Stochastic Matrices) . The Birkhoff 
polytope is an (n — l) 2 dimensional polytope in W 1 constituting the set of doubly stochastic ma- 
trices and is the convex hull of the permutation matrices (see e.g. [H]). While its extreme points 
are sparse matrices we shall see that typical entries chosen according to the uniform distribution 
are by contrast very dense. Little is known about the properties of uniformly distributed stochastic 
matrices as they fall outside the scope of techniques from the usual random matrix theory, however, 
important recent progress has been made by Barvinok and Hartigan. 

We will let X = (Xij)ij = \ n denote a uniform doubly stochastic matrix. By symmetry its rows 
and columns are exchangeable and all its entries have the same marginal distribution. It is natural 
then to ask what is the limiting distribution of nXn, the first entry rescaled to have mean 1. In 
our first result we determine that the rescaled marginal distribution converges to an exponential 
random variable of mean 1. 

Theorem 1. With X = (-X"ij)i,j=l,...,n a uniformly chosen doubly stochastic matrix we have that, 

nX\\ -4- exp(l) 

as n — > oo where the convergence is in total variation distance. Further, for any e > 0, 

d tv (nX 11 ,exp(l)) = 0(rT 1 / 2+e ). 

A natural extension to this question is to ask about the joint distribution for a collection of 
several entries. It can be shown using the same approach that finite collections of random variables 
converge to independent exponentials with mean 1. This convergence holds not just in distribution 
but also in total-variation distance and its moments converge to the moments of independent 
exponentials (see Section 13. 2p . We believe that in many ways uniformly distributed stochastic 
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matrices behave much like matrices of independent. For example the largest entry of the matrix is 
at most (2 + o(l))^ logra with high probability, 

Theorem 2. For any e > 0, 

P ( max nX^ > (2 + e) log n ) -> 0, 

\l<ij'<n ' / 

as n — > oo. 

Another question one may ask is the limiting distribution of the singular values of X = n x / 2 (X — 
EX). Denote these by < o%{X) < ... < a n {X). Letting fi denote the measure on [0,2] with 
density 

-a/4 - x 2 

TT 

we have the following result. 

Theorem 3. The limiting empirical singular value distribution of X is given by 

n 
8=1 

where the convergence is in the weak topology, in probability as n —> oo. 

We conjecture that the empirical spectral distribution converges to the circular law. 

One natural question is to ask how large a sub-matrix can one take so that the entries are still 
asymptotically independent. This problem was studied in the context of the random orthogonal 
matrix [31] where it was shown that anfcxfc sub-matrix is asymptotically distributed as independent 
normal random variables in total variation provided k = 0(77,2) answering a question of the second 
author [24]. In [31] it is further shown that order n/logn entries simultaneously converge if weaker 
topologies are used. Here we show that for sub-matrices of uniformly distributed stochastic matrices 
of size almost n 1 / 2 the entries are asymptotically independent. 

Theorem 4. Let V denote the projection of a uniformly distributed stochastic matrix onto the 
k x k-sub-matrix of its first k rows and columns and let A be a k x k matrix of independent mean 
one exponential random variables. When k = the rescaled law ofV converges to A, 

d tv (nV,A)^0 

as n —> oo where d tv denotes the total variation distance. 

Unlike most other classes of random matrices, uniformly distributed stochastic matrices are of 
course stochastic which raises the question of the properties of the associated Markov chains. For 
any doubly stochastic Markov transition kernel the stationary distribution is the uniform distribu- 
tion. For a uniform stochastic (but not necessarily doubly stochastic) matrix, that is a uniformly 
chosen Markov chain, the mixing time is two asymptotically almost surely [I]. We show that this 
holds also for uniformly chosen doubly stochastic random matrices. 

Theorem 5. The mixing time of the Markov chain given by a uniform double stochastic matrix is 
with high probability 2. 

In Section [1] we give background and history for the Birkhoff polytope. In Section [2] we give the 
proofs of Theorems [T] and HI Then in Section [3] we begin by studying polytopes of matrices with 
non-constant row sums. By establishing that the volumes of the polytopes are maximized when the 
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row and column sums are equal, we get strong control over the distribution of a row in a uniformly 
distributed stochastic matrix through which we can bound the tails of the marginal distributions 
establishing convergence of the moments and Theorem [2j Finally, knowing that the entries are not 
too large allows us to show strong concentration for the entries of X 2 which guarantees that the 
mixing time is 2. 

1. Background 

This section gives background and references for four topics that motivate our work: the Birkhoff 
polytope, prior distributions on Markov chains, limit theorems for entries of large random matrices 
in classical compact groups and contingency tables with fixed row and column sums 

1.1. The Birkhoff Polytope. The set A4 n of n x n doubly stochastic matrices is known as the 
Birkhoff polytope, the bistochastic polytope and the assignment polytope. It is a basic object 
of study in operations research because of its appearance as the feasible set for the assignment 
problem. Given a cost matrix C{j this asks for a permutation a minimizing X/i^Wi)- This is 
the same problem as minimizing Ylij CijMij for M G M n because of Birkhoff's Theorem: the 
permutation matrices are the extreme points of A4 n . A thorough treatment of the assignment 
problem is in |33j . 

Because of this connection, the structure of M n has been intensively studied. Two permutations 
o", <j are adjacent on M. n if and only if aq^ 1 is a cycle (see [25] page 214). The diameter (the 
maximum distance between two vertices on the skeleton) of A4 n is two [25]. The face structure of 
A4 n is described in [11] . Finding a closed form expression for the volume of Ai n is a well known 
open problem. The volume is a rational number and in known for n < 14 (see [16] and references 
therein). The combinatorics suggest a simple probability problem: what is the mixing time of the 
nearest neighbor random walk on vertices of A4 n ? Pak [39] showed that it is two. 

Birkhoff's characterisation of the extreme points is "equivalent" to other basic theorems in com- 
binatorics such as Kontg's Lemma, Hall's Marriage Theorem and the Max-flow Min-Cut Theorem. 
A splendid account of these connections is in [33J . 

There are other polytopes with similarly nice descriptions. For example, the symmetric doubly 
stochastic matrices have extreme points \{A a + A^) with A a the permutation matrix of a 140]. 
Perhaps the methods and results of our paper can be used to study the behavior of a randomly 
chosen point in these polytopes. The properties of the random tri-diagonal doubly stochastic 
matrices are thoroughly studied in [19] . 

1.2. Statistical Analysis of Markov Chains. Our original motivation for this work comes from 
the statistical analysis of a Markov chain on {1, 2, . . . , n} with unknown transition matrix (Ay) G 
Qn (Qn the set of stochastic matrices). One observes a run Rq, R\, . . . , Rjy and is requried to 
estimate (Ay). A Bayesian approach to this problem starts with a prior distribution on Q n . The 
classical Bayesian approach using, conjugate priors, sets each row to be an independent Dirichlet 
distribution. One natural choice has each Dirichlet distribution as uniform on the n-simplex. This 
gives the measure studied below. For background and references see [35J [22j |42] . 

Recent developments put priors on natural subclasses of Markov chains. For example [20^ 0] 
develop and apply priors for reversible Markov chains and [5] develop priors for higher order Markov 
chains. 
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It is natural to consider priors on the space of Markov chains with a fixed (known) stationary 
distribution. This is again a connected convex set. Perhaps the most natural example is the uniform 
distribution on {1, 2, . . . , n}. Now the set of transition matrices is the Birkhoff polytope and the 
uniform distribution is a natural prior. Understanding the uniform distribution for large n leads to 
the topics in this paper. 

Knowing about Birkhoff 's Theorem it is also natural to study the prior measure on M n resulting 
from a uniform combination of extreme points. Thus if A a is the permutation matrix corresponding 
to a and {Ao-} is a uniform point of the n!-simplex then M = ^2 aeSn A a X a is a uniform combination 
of extreme points. This distribution was proposed and studied in |37j as a way to put a prior on 
the parameters of an n x re-contingency table with known uniform margins. The following result 
suggests this is a strange distribution, sharply concentrated about the matrix with all entries 1/n. 



Proposition 1.1. Let M € Ai n be a uniform convex combination of extreme points. Then 



n — 1 
n V n\ + 1 



Proof. The distribution of Mix is given by Beta(a,6) distribution with a = (ra — 1)! and b = 
(n — l)(n— 1)! which has mean a/(a+b) = 1/n and variance ab/ (a+b) 2 (a+b+l) = (n— l)/n 2 (n! + l). 
Then by the symmetry of the entries 



-E \ Mij 1 = n 2 E\M n 1 < nVVarMii < n\ —, — - 

' n n " V n\ + 1 



Of course, this prior is absolutely continuous with respect to the uniform distribution and a 
sufficiently large amount of data will swamp the prior (although this may be prohibitive large when 
n is large). 

A variety of measures on the stochastic matrices were studied in the subject of "random random 
walks" [27]. This area was initiated with a theorem of Aldous and Diaconis [TJ. If an n x n 
stochastic matrix is chosen by making the rows uniform on the n-simplex the expected time to 
stationarity is small, indeed two steps suffice (but one does not). This suggests that this models 
does not capture the essential features of real Markov chains which are usually "local". Much of 
the work thus restricts attention to random walks on finite groups G (see [27] for more details). 

Our discussion leaves many points untouched. To generate points from the uniform distribution 
on Ai n we use a basic "Gibbs sampling algorithm": pick a pair of distinct rows and a pair of 

distinct columns at random. These intersect in a 2 x 2 matrix A = ( ^ ] This is replaced 

\ c d J 

by f a , ^ ^ chosen uniformly on the set of matrices with the same row and column sums as A. 

This is easy to do choosing a' uniformly from the relevant range. We would like to understand the 
running time of this algorithm. A host of other algorithms for uniform choice in a compact set is 
in ®- 

The posterior distribution on A4 n after observing the Markov chain of length N is proportional to 
Y\i j x ij where N(i,j) is the number of observed transitions from i to j in the run. How do such 
measures behave? Our work suggests a heuristic: the measures should behave like product Dirichlet 
distributions. The ith row having density proportional to Ylj x ij • The known properties of the 
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Dirichlet distribution now make basic questions accessible. For example, the Bayes estimate of the 
transition matrix is easy to compute. 

1.3. Elements of Random Matrices. The present paper has many points of contact with the 
ongoing study of the behavior of entries of a uniformly chosen random matrix in one of the classical 
compact groups O n or U n . These problems we originally studied to understand the 'equivalence 
of ensembles' in statistical mechanics. Indeed, the first row of a random matrix in O n is uni- 
formly distributed on the n-sphere-the micro-canonical ensemble. The entries multiplied by \fn 
are approximately independent standard normal-the canonical ensemble. This is an early theorem 
of Borel; see [22J for a historical review, sharp statements and pointers to the work of Levy and 
others. Later these theorems were extended and used to prove sharp finite forms of de Finetti's 
theorems and many extensions |24j . 

For M chosen uniformly on U n , the entries multiplied by y/n are approximately independent 
standard complex normal. This has been proved in various sense. For example [31] shows that 
an m x m block is close to normal in total variation if m = o(i/n). For other topologies [29] 
shows indepdent normal behaviour persists for m = o{nj log n). Other global features, such as the 
maximum entry [30], traces of powers of M |21U17j and arbitrary linear combinations of the entries 
|15j behave like normals as well. Of course there are differences. The eigenvalues of a random 
element of U n lie on the unit circle while the eigenvalues of independent normals fill out the disk 
uniformly. For refinements, see |36t I38j. 

Yuval Peres suggested that these results may have a close connection to the Birkhoff polytope. 
Let M be uniform in U n and set iVjj = | iW^- 1 2 . Then N is doubly stochastic with entries approxi- 
mately independent and exactly exponentially distributed. While we show in Section \3. II that these 
distributions are not the same it seems likely that they share many properties. 

Classical results for equivalence of ensembles show equivalence of micro-canonical and canonical 
ensembles which result from fixing low dimensional sufficient statistics. The results above, and in the 
present paper, show that equivalences of various sorts persist after conditioning on high dimensional 
statistics: If {Eij} is a matrix of independent exponentials, the conditional distribution given that 
all the row and column sums are equal to one is uniform on A4 n . More background on equivalence 
of ensembles can be found in [42] and |32j . 

1.4. Magic squares and contingency tables. There is a close connection between the Birkhoff 
polytope A4 n and MS(n, c) the set ofnxn matrices with non-negative integer entires and all row 
and column sums equal to c. Elements of MS(n, c) are called magic squares in the enumerative 
literature. It is known that \MS(n, c)| is a polynomial in c of degree (n — l) 2 . The leading coefficient 
of this polynomial is a simple multiple of the volume of M. n [H]. See also [18] . 

Generalizing, the set of m x n matrices with non-negative entries and fixed row and column sums 
is intensively studied both in combinatorics and statistics where they are called contingency tables. 
It is known that exact enumerations of the size of this set is #P-complete even when n = 2. A 
host of techniques for approximate counting and random generation have been developed as well 
as a remarkable collection of asymptotic formulae. See [23] and [6] for surveys. 

Questions of the properties of random contingency tables or randomly chosen points in polytopes 
are closely connected to the problem of estimating the volume of the polytopes. Important recent 
work by Barvinok and Hartigan has given asymptotic formulas for the number of contingency tables 
and the volumes of polytopes of such matrices [H [91 [7] as well as the closely related problem of the 
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number of graphs with a given degree sequence [TO] . A central idea in their analysis is the maximum 
entropy distribution which for the Birkhoff polytopes corresponds to independent exponentials for 
the vertices of the matrix. This maximum entropy distribution provides a good approximation to 
the distribution yielding (after much work) an asymptotic calculation of the volume. 

Beyond asymptotic volume calculations Barvinok [6] also asked the question of "what does a 
random contingency table look like"? In [?] a precise sense was given to the statement that "in 
many respects a random matrix behaves as a matrix X of independent geometric random variables" , 
a direction pursued independently in this paper. One result of this equivalence given in [6] is 
that the sum of large subsets of the entries of such contingency tables are concentrated around 
their expectation given under the maximum entropy distribution. Barvinok [7] posed the natural 
question of determining the marginals of the entries of such random matrices. In the case of doubly 
stochastic matrices we answer this question determining that they are asymptotically independent 
exponentials. 
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2. Marginals of Uniform Doubly Stochastic Matrices 

Let X = (Xij)i j=i n be a uniform doubly stochastic matrix, that is chosen uniformly from 
the Birkhoff polytope. Since the sum of the rows and columns add to 1, it satisfies 2n — 1 linear 
constraints and the matrix is determined by the [n— l) 2 entries (^ij)ij=i,...,n-l- Let T : IRA" -1 ) — )• 

2 

IR n denote the function 



T(X) = T(X)i 



Let : 3R,(" — > R" 2 be the projection X i— > pQj)i<M<n-l- By an abuse of notation we will also 
use r as a function from R n to itself by T(Q(X)). Then the doubly stochastic matrices correspond 
to the (n — 1) x (n — l)-matrices in the set 

S n = \ {xij)i,j=i,..., n -i G [0, l] (n_1)2 : min - T(x)ij > 

The distribution of (-X'jj)j 1 j=i ! ...,n-i is given by the uniform distribution on S n . Let Z n denote the 
volume of S n , that is 

Z n = I(x G S n )dx 

J[o,i}("- 1 '> 2 

where I denotes the indicator function. Canfield and McKay [13] showed that asymptotically the 
volume of the Birkhoff polytope (in units of basic cells of the lattice which is equivalent to our 
usage) is 

1 1 f\ 2 



Z n = ^1 • -p [- + + 0(1)) . (2.1) 

Also define 

V n = U yij )i, j=1 ,...,n G R" 2 : ®(±y) G S n ,mm{y - T^y))^ > o| . 

As we observed in the introduction, the uniformly distributed stochastic matrix shares many prop- 
erties with matrices of independent exponentials so let us define {Xij)l<i,j<n as a matrix of iid 
exponential mean 1 random variables. 
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Lemma 2.1. Conditional onY£ V n we have that h(Yij)l<i,j<n-l * s uniform on S n . Further, for 
large n we have that, 

P(Y G V n ) > n' An . (2.2) 
Proof. Let W be the product of the intervals W = Tli<ij< n Iij where 



L 'j 



[0,oo) if max{i, j} = n 
{0} o.w. 



Then for each fixed Y G S n the set {Y G V n : (±y)ij=i,..., n -i = y} is nT(Y) + W. Since the 
density of Y depends only on ^ . y^- and since £^ nr(y)y = n 2 it follows that T(^Y) is uniform 
on S n . Now 

,, / n n \ 

F(y G X> n ) = / exp -VIX^'J ) I(y G £> n ) dyn . . . dy nn 

= / / exp hEE n ( r ^ " 

■ I (^n\\\!i,j - T(Y)ij > 0^ dy u . . . dy nn 



(2.3) 



Vol n 2 (nS n ) exp(— n 2 ) 



(2.4) 



r. „ in n—1 \ 

/ / exp - y~] y in - y~) y in dy Xn . . . dy nn dy nl . . . dy nn -i (2.5) 

7[o,oo)2«-Wr2„-i I ^ j^i J 



= Vol ra 2(n5 n ) exp(— n 2 ). 
Combining equations (|2.ip . (|2.7p . (|4.6p we have that 



1 



__-_eJcpf- + n +o(l)Jn" exp(-n ) > n 



-in 



(2.6) 



(2.7) 



for large n. 



In particular this means for X uniform on A4 n , for any measurable set B C M^ n ^ , by equation 
(|2.7|) we have that 

P(X G B) < n 4n P($>{Y) G 0). (2.8) 

This equation is only meaningful when P(<I>(y) G B) < n 4n . However, for a number of important 
large deviation events we can effectively translate results about Y to results about X. In particular 
using the exchangeability of the entries of X we can establish the asymptotic marginal distribution 
of the entries of the X given in Theorem [TJ 



Proof of TheoremUl Let A be a measurable subset of [0, 00). By the Azuma-Hoeffding inequality 

n— In— 1 
V > i=l 7 =1 



> l n ~ 1/2+€ I <exp(-m 1+2e ). 
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Then by equation (|2.8p we have that, 
n— 1 Ti—l 



n\n 



1 



> _ n -l/2+e | < n 4« exp (_ cn 2) < e xp(-cV) 



i=l j=l 

and so since the entries of A are exchangeable, 

\P{nX n eA)- P(Y n G A)\ < n- 1/2+e + exp(-c'n 2 ). 

As this holds uniformly over all A it follows that d% Y (Xn, Yu) < rt -1 / 2+e for large n which estab- 
lishes the result. ■ 

2.1. Marginal distributions of submatrices. In this subsection we go beyond marginal dis- 
tributions and investigate the asymptotic distribution of sub-arrays of the matrix, in particular 
showing that for boxes of sidelength almost y/n the entries are close to iid exponentials after rescal- 
ing. 

Fix some k = k{n) = O(j^r^)- Define W tli2 G K fc2 as the k x /c-submatrix of entries of the 
matrix Yy for i G {{£\ - l)k + 1, . . . ,hk} and j G {(£ 2 - l)k + 1, . . . ,£ 2 k}, i.e., 



/ ^i-l)fc+l,(fe-l)fc+l • • • 



w 



till 



Ytrffak / 



Let e > and let A be a measurable subset of i? fc2 . By the Azuma-Hoeffding inequality we have 
the following large deviations bound. 

- 2 Ln-l/fcJ [n-l/fcj 

1(1^ G A) - P(I^ n G A) 



P 



n—1 



1 



> -e | < exp 



n — 1 



k 



(2.9) 

Now define G M. k as the fc x /c-submatrix of X^ with i G {(^i — l)k + 1, . . . ,£\k} and 

j e{(£ 2 -l)k + l,...J 2 k}, i.e., 



A, 



«2 



^lfc,(^2-l)fc+l 



A, 



(ti-l)k+l,t2k 



X 



e 1 k,e 2 k 



We now prove Theorem [5] showing that dt v (nV ,W ) converges to 0. 

Proof of Theorem^ By equation (12. 8h and (12. 9h we have that, 

1 7i— In— 1 

V V I(nV« G A) - P(W n G A) 

n(n -1) ^ ^ 

V ' i=l j=l 

Since the entries of A are exchangeable this implies that, 



P 



> -e | < n 4n exp | - — 



n — 1 



o(l). 



1 



|FK U G ^4) — P^ 11 G -4)| < -e + o(l). 

As this holds uniformly over all A it follows that dt Y (nV , W 11 ) < e for large n which establishes 
the result. 
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3. Further properties of uniform doubly stochastic matrices 

In this section we establish further properties of the matrices including convergence of moments 
and the mixing time of such matrices. 

3.1. Non-constant row sums. It will be important to consider the generalized case of m x re- 
matrices with fixed but non-constant row and column sums. For a sequence of positive row sums 
{ai}^] =1 and columns sums {bi}f =1 where Ya=i a % = Ya=i h = t we define the transportation 
polytope p = p((aj),(6j)) to be the polytope of m x n-matrices with nonnegative entries, row 
sums ctj and column sums bi. Let V m ,n,t denote the set of all such polytopes and let p* = p* mnt 
denote the special case of polytopes with constant row sums t/m and column sums t/n. We will 
let Vol( TO _iv„_ 1 ) (p) denote the volume of the image of the set p under the map 

(^tj)i=l,...,mj=l,... 1 ii 1 ^ (^-ij )i=l,...,m— l,j=l,...,n— 1 

in R( m-1 )( n_1 ). The following lemma shows that amongst all rre x re-matrices p* has the largest 
volume. 

Lemma 3.1. We have that 

Vol{ m -l){n-l){P*m,n,t) = max Vo\ m -. X )(n-l){p) 

Proof. We begin by proving the following simpler claim. 

Claim 3.2. Let {aj}™ 1 be a collection of row sums with Ya=i a i = t an d let P( r ) denote the 
polytope of m x 2-matrices with row sums (ai) and column sums r,t — r for < r <t. Then 

Volr m -i)p(t/2) = max Volr m -i)p(r). 

0<r<t 

Let X = (Xij)i = i t ___ %m> j = i t 2 be chosen uniformly according to p(r). Let {Yi)i=x..., m be independent 
random variables with the uniform distribution [0, a%]. It is easy to verify that (^Qi)j=i...,m is equal 
in distribution to (li)i=i...,m conditional on Ya=i = r an< ^ moreover that the volume Vol( m _i)p(r) 
is proportional to the density of Ya=i ^ a ^ r - 

It remains to show that this density is maximized at t = r/2. We say a distribution is log- 
concave if the logarithm of its density concave. This clearly includes the uniform distribution on an 
interval. Moreover, the sum of independent random variables with log-concave distributions itself 
has a log-concave distribution [12| . Since the density of Ya=i ^ s symmetric about t/2 it follows 
that it is maximized at t/2 which completes the claim. 

We now complete the proof of Lemma [3.11 Let p = p ((ai),(bi)) and p' = p ((ai) , (b'i)) where 
b\ = b' 2 = and 5/ = 5. for i > 3 F urt her define the set 

m 

A = {(ai)^s : < Oi < a { , ^ h = t - h - b 2 } 

i=i 

which represent possible values for the sum of the entries of the rows of a matrix in p excluding the 
first two columns. Then by first conditioning on these sums we have the following integral for the 
volumes 

Vol^x^^p ((oj), (bi)) 

Vol( m _!)p ((a; - ai), (&i)i=i )2 ) Vol( m _ 1)(n _ 3) p ((ai), 0i)i= 3 ,...,n) K d ( a i)) 
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where [i is the uniform distribution over A*. Similarly 
Vol( m _ 1) ( n _ 1 )p ((aj), (6-)) 

Vol (m „i)p ((oj - en), (6-)i=i, 2 ) Vol (m _ 1)(n _ 3) p ((0,), (69i=3,...,n) 

Applying Claim [3^21 we. therefore, have that 

Vol (m _ 1 ) (n _ 1 )p ((ai), < Vol (m _ 1 )( n _ 1) p (((h), (&■)) 

which says that replacing the first two column sums by their average can only increase the volume 
of the polytope. This is true of course for any pair of columns and similarly for any pair of rows. 
It is easy to show that the volume of polytopes in V m ,n,t are symmetric and continuous in the row 
and column sums (aj), (hi) and hence it follows that p* must be a maxima of the volume. ■ 



Canfield and McKay [13] give an asymptotic formula for the volume of matrices with constant 
row and column sums as 

Vol( m _ 1 )( n _ 1 )(p^ in>m ) 

1 1 (\ (m-n) 2 , \ , , 

exp (- + m n-l— — + o(l) . (3.1) 



m (n-l)/2 n (m-l)/2 ( 2vr )(m+n-l)/2 n (m-l)(n-l) ^ \3 Ylmn 

Note that our definition of volume corresponds to their notion of volume in units of basic cells of 
the lattice induced by Z mn . 

Let 1Z = lZ r ,n denote the r(n — l)-dimensional polytope of nonnegative matrices whose rows 
sum to 1. Let v r denote the measure on 1Z induced by the first r rows of a uniform doubly 
stochastic n x n-matrix (Xy) and let \i r denote uniform probability measure on 1Z. Equivalently 
[A r is the measure induced by the first r rows of a uniform stochastic matrix(one where the rows 
are independent and conditioned to sum to 1). 

Lemma 3.3. For a fixed integer r > 1 and n > r the Radon- Nikodym derivative of the measures 
[i T and v r satisfies 

p-<(i + (i)y/ 2 . 

as n — y oo. 

Proof. Conditioned on the first r rows of a uniform doubly stochastic n x n-matrix (Xy) the 
remainder of the matrix is a uniformly chosen matrix from the polytope of (re — r) x n-matrices 

r 
i=l 

where l m represents the vectors of l's of length m. Since fi r is the uniform distribution over 
1Z = lZ rtn it follows that 

du T 
-r^(Xij) oc Vol (ft _ r _i )(n _i ) p(l n _ r , (1 Y. X '<\j i 

i=l 

where tx denote proportionality. To determine the constant of proportionality note that 

r r 
Z n = Vol r ( n _!)(^) / Volfa^i^yplln-r, (1 - ^2 X ij) j=1> ..^n) VridiXij)) 

•' n i=l 
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recalling that Z n is the volume of the Birkhoff polytope. It follows that 
dv 1 r 

r -( X ij) = 2 _V °lr(n-l)('^-)Vol( n _ r _ 1 )( n _ 1 )p^l rl _ r , (l - y^^ij), | , 
1 



dfi r 



i=l 



< 



-Vol r(n _i)(^)Vol( n _ r _ 1 ) (n _i ) 



by Lemma 13.11 Hence substituting the formulas for the volumes of the polytopes and applying 
Stirling's formula we have that 



di> r 
d/i r 



(X tj )< (l + o(l)) 



n 



n-l 



(2tt)' 



n-l/2 n (n-l) 2 p -rn 



( n _ f )(n-l)/2 n (n-r-l)/2 (( n _ l)l)r ( 27r )(2n-r-l)/2 n (n-r-l)(n-l) 



= (1 + o(l))n r / 2 e r/2 • 

= (1 + o(l)) C p / 2 
which completes the proof. 



n 



(V27m n n e- n ) r 



■ (2vr) r / 2 n r(n - 1) e^ rn 



This proof also shows that the uniformly distributed stochastic matrix is not given exactly by 
the square of the absolute value of a random unitary matrix. In such a random matrix the rows 
are distribution according to \i\ while we have that 



dm 



-Vol r(n _i)(^)Vol( n _ r ._ 1) ( n _ 1 )p^ )ni?n = (1 + o(l))e 



r/2 



Hence at least for large n the models are not the same (in the trivial case of n = 2 they are equal) . 

3.2. Convergence of Moments. Using Lemma 13.31 we may now establish convergence of the 
moments of the entries of a doubly stochastic matrix to those of independent exponentials. We will 
let (Vfe) be a sequence of iid exponentially distributed mean 1 random variables. 

Lemma 3.4. Let ■ ■ ■ , {iliJl) be a fixed sequence of pairs of positive integers and ati, . ■ ■ ,ol 

be fixed a sequence of positive integers. Then if (Xy )ij=x n are distributed as a uniform doubly 
stochastic matrix then 



E\[{nX 



1-kt3kJ 



k=l 



k=l 



Proof. By Theorem [1] the joint distribution of the (nXi k j. )k=i ... l converges to iid exponential 
random variables. It follows that 



E 



fl(nX ik>jk )^I(m^ L nX ih)jk < M) 



k=l 



E 



n^Amax V k <M) 



fc=l 



and hence we can complete the proof by showing that 

L 



lim lim sup E 

M— >oo n 



\[{nX„ 



'<-k,3k> 



/( max nX iktjk > M) 



k=l 



Kk<L 



0. 



(3.2) 



By the exchangeability of X we may assume without loss of generality that maxi<fc<^ if. < L and 
that maxi<fc<i jj~ < L. In particular this assumption implies that each of the entries Xi k j k appear 
in the first L rows of the matrix. Let Yij denote a uniform stochastic matrix, that is one whose 
rows are independent and chosen according to fj,\. 
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Now by Lemma 13.31 it follows that 



E 



TlinX^^Iim^nX^ > M) 



k=l 



< {e L ' 2 + o(l))E 



n(ny ifc)j J^I(max. nY ikdk > M) 

k=l 



and hence it is sufficient to establish equation (|3.2p replacing the Xi kt j k with Yi h j k . Now the Yi k j k 
are given by Beta distributions 5(1, n — 1). It follows that 



t 



E Kk = n —m = (1 + °^ in ~ ak ^ 
i=i 

By the power mean inequality and the fact that E\Y\ a I(Y > M) < M- l E\Y\ a+l 

L 

E ]l(nY ikJk )^I(me^ nY ik>jk > M) 



Kk<L 
k=l 

L 



< E—^- ^a k {nY ik j k jZi=^H{ max nY ikJk > M) 

T.k=i a kkTi l - k - L 

1 L 

< M- l E— L J2 a k(nY lkJk ) 1+ ^ a * 



and hence by equation (j3.3H . 

L 



lim sup£ H(nY lk , Jk r«I( max nY ik , jk > M) = 

M— >oo n Kk<L 
k=l 

which completes the proof. 



We may also examine the maximal element of the matrix. For an n x re-matrix of iid exponential 
random variables with mean 1 the maximum entry is at most (2 + o(l)) log n with high probability 
and we show that this is also the case for the renormalized uniform doubly stochastic matrix. 

Proof of Theorem [3 By Lemma 13.31 we have that 

P(nX n > (2 + e) log n) < (e 1/2 + o(l))P(nY n > (2 + e) log n) 
Now since Y\\ has -6(1, re — 1) distribution 

P(nY n > (2 + e) log n) = (n - 1) / (1 - y) n " 2 

J (2+e) logn 
n 

1 (2 + 6)logn V^ 1 
n J 

= (l+o(l))n- 2 " e . (3.4) 
The exchangeability of the entries and a union bound completes the proof. ■ 
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3.3. Mixing Time. As uniformly distributed stochastic matrices correspond to the transition 
matrices of Markov chains one can ask about the mixing time of such matrices. 

Proof of Theorem LH By Lemma 13.31 the mixing time cannot be 1 since it implies that the rows of 
the matrix are not close to being constant. We show at time 2, however, they are almost constant. 



Let X^j denote the ij-th entry of the matrix X 2 . The total variation distance from stationarity of 



-(2) 
~ij 

the Markov chain at time 2 is given by 

1 „(2) 



- -x 



max — 

i 2 

which is equal to 

X>ax{l-4f,0}. 



max 
i " — ' ~n 

i=i 

Since the rows are exchangeable, by taking a union bound it is sufficient to show that for each 
e > 0, 



P ^max{i -Xg } ,0} > ej = o(l/n). 



We will again work first in the independent entries model (Yij). Let P denote the cr-algebra 
generated by (Yij)j=i,...,ra-i and let H denote the event 



< max Y\j < 3 log n > n < Y\j < n — 3 log \ 
[i<j<n-i J ^ j=1 

The sums X^fc=2 ^lfc^fcj are conditionally independent given P. Further for 5, A > 0, 

P [ ~ J2 Y ^ Y kj < 1 - 5 and U | P J = P ( ~ £ Y lk (l - Y kj ) > 5 - 61ogn and % \ P J 

\ n fc=2 / \ n fc=2 / 



, n— 1 
A 



P exp - ^ F lfc (l - Y kj ) > exp (^(5-6 log n)) and ^ | P 



n 

, fc=2 



Now if Yif, < ^ log n and A = ^ - then by Taylor series for large n and 1 < j < n — 1, 
/x , . v . , exp(^Fi fc ) 1 + ^Y lk + (-Yii.) 2 //X n9n 

£ [exp (Ay lfc (i - Y kj )) | y lfc = ^ \' k) < " * fc Ay-" < ex P ((^i*) 2 ) 

1 + ^ifc ! + ^ r ifc 

Hence by Markov's inequality for large n, 

n-l \ Pvn f -J" 



P f-E y ^ < 1 - <5 and ^ I P ) < ^y^rr <expf--^) 

/ R ,n " (i '" ogn) V 21og 3 ny 



fe=2 / exp 

with room to spare. By the conditional independence of the sums we have that 

n 2 5 2 



P (# jl <j < n- 1 : ^Yy ik Y kj < 1 - <5 j > <5n and % | P^j < 



exp 



2 log 3 n 



(3.5) 
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This implies that 

n 2 5 2 



exp 



21og 3 ra. 

We can now return to the doubly stochastic matrix setting. By equation (|2.8|) we have that 

n 2 5 2 



and hence since = X]fe=i ^lk^kj > Ylk=2 XikXkj an d so 



exp 



2 log 3 n 




max<i - -X$\o\ > 25 + -, max X ln > I = o(l/n)). 



By equation (|3.4|) we have that 



so it follows that 



Pi max X ln > = 0(n" 2 ) 

l<i<n n 



1/n) 



for any 5 > 0. Letting 5 go to completes the proof. ■ 

4. Singular Values 

In this section we give the proof of Theorem [3j Let < er™ < • • • < a™ denote the singular 
values of n 1 / 2 (X — EX). These correspond to the square roots of the eigenvalues of the matrix 
n(X — EX)(X — EX)* which is a Hermitian matrix. For a Hermitian matrix A let Xi(A) < . . . < 
X n (A) denote its eigenvalues and let fi{A) = Ya=1 ^A t (A) denote the empirical spectrum of A. 

Let {Yij)i,j=l,—,n denote the n x n-matrix with i.i.d. entries supported in [0,K] and consider the 
Wishart Matrix H n = n -1 (Y — EY)(Y — EY)* which is Hermitian and hence has real eigenvalues. 
Marcenko and Pastur [3l] showed that /i(H n ) — > p! weakly in probability as n — > oo where fjf is the 

distribution on [0,2] with density ^ % ^ x ^ ■ 

As with our previous results we use large deviation results on random matrices to transfer results 
to uniform doubly stochastic matrices. In this case we use results of Guionnet and Zeitouni |26] 
who establish concentration of measure results for the spectrum of large Wishart matrices. In 
Corollary 1.8 and the remarks that follow they show that for any e > there exists c(e) > such 
that for large n and K > 1, 

P (d w (/t(H n ), £/2(S n )) > e) < exp (-cif~ 2 n 2 ) . (4.1) 

where dw denotes the Wasserstein distance. We will take the entries of Y to have density given by 

, s jj^m e~ x xe [0,101ogn], 
Pn(x) = <' n (4.2) 
L) o.w. 
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That is the entries are mean 1 exponentials conditioned to be less than 10 log n and so it follows 
that 

P (d w {fi(E n ),Efi(E n )) > e) < exp (-c'n 2 log" 2 ) . (4.3) 

Now let 

S n = \ (xij)ij=i,..., n -i G S n : max T(x)ij < f logn } 
{ l<i,j<n J 

which corresponds to the doubly stochastic matrices whose maximum entry is at most ^ log n. Also 
define 

£>n = {{xij)i,j=i,...,n G [0,81ogn] n2 : ~(a?y)i,j=l,...,n-i G ^n,Vl < i,j < n,0 < (x - r(sc))y < n" 4 | . 
The following lemma is the analogue of Lemma 12. II for Y. 

Lemma 4.1. With Y as above with marginals given by (14, 2D . conditional onY£ D n we have that 
r(— y) is uniform on S n . Further, for large n we have that, 

P(Y G 2?„) > n" 8n . (4.4) 

Proof. Let W be the product of the intervals W = IIkj j<n A? wnere 

[0, n~ 4 ] if max{i,j} = n 
{0} o.w. 

Then for each fixed Y 6 S n the set {Y e V n : = F} is rar(Y) + W. Since the density of ? 

depends only on Yj an d since Y^ij n ^(Y)ij = ft 2 it follows that r(-Y) is uniform on S n . 
Now 

P(Y G P n ) = (1 - n- 10 ) 

= (l + o(l))exp(-n 2 )n n2 Vol n2 (P n ) (4.5) 
as for all Y 6 P n we have that 

n n 

n 2 < <n 2 + (2n + l)n" 4 . 

i=l i-l 

The volume of W is clearly 

n -4(2n-i) g0 we 

have that 

Vol n2 (P n ) = Vol {n _ 1)2 (5 n )n- 4 ( 2n - 1 ). 

Now interpreting S n as a subset of S n it corresponds to the set of doubly stochastic matrices whose 
maximum entry is at most 61ogn. Hence by Theorem [2] we have that 

Vob r ,_i' 1 2 (S n ) 

Vr / } g = P(maxX ij < 61ogn) = 1 - o(l). (4.6) 
Vol( n _i)2 (b n ) 13 

Combining equations (|2.ip . (|2.7p . (|4.6p we have that 



„ I n n \ 

~ n2 L ex p - y y d 2/n • • • 



nrT, X \ ^ N exp(-n 2 )n n V 4 ^" 1 ) /l 2 

P(Y G 2> B ) = (1 + o(l)) J_ l 1(2 ; )w _ 1/2re(w _ 1)2 exp (- + n 2 

> n~ 8n (4.7) 
for large n. ■ 
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Now the Courant-Fischer Minimax Theorem says that for an n x n Hermitian matrix X the A;-th 
eigenvalue of X is given by 

A fe (X) 



x*Xx 



mm max 

U-.dim(U)=k x&J x*x 

where the minimum is over all /c-dimensional subspaces of IR n . It follows that for Hermitian matrices 
X, Y that 

\X k (X) - X k (Y)\ < ||X-F|| op <nmax|Xy-^| 



|op 



is the operator norm (see e.g. [2H]). For Y G S n and Y G V n such that Y(~Y) = T{Y) 



where 

we compare the eigenvalues of the matrices 

A = n(T(Y)-n- l l)(T(Y)-±iy 
B = n- 1 (Y-yl)(Y-yl)* 

where y = 1 ~( l+ ^^°s^) n — = EY\\ and 1 is the n x n-matrix of all l's. By the above bound we 
have that for 1 < k < n, 



\X k {A) - X k (B)\ < n max | Am - By] 

hi 



(4.f 



Breaking A — B into parts we first have that 



sup 



( nT(Y) 2 - n^Y 2 



supn 



2{nT(Y))(Y - nY(Y)) + [Y - nT(Y) 



= 0(n- 3 ) 

since nxiaxij(nT(Y))ij < 61ogn and maxjj \(Y - nT(Y)) i:j \ < n" 4 . Also 
nT(Y) • n" 1 ! - n^Y ■ yl) 



(4.9) 



sup 



sup 
sup 

i,j 

0{n 



T{Y)-n- L yY 1 



T(Y) -n -1 F) 1 



+ 0(n 



-10\ 



since 1 - y = 0(n- w ). Finally we have that 



sup 



' n 1 1 — n 1 y 2 l' 



0(n 



-10\ 



since 1 - y 2 = 0(n~ 10 ). Combining ([47T2]) . (l49j) . (j47TO|) and KTTh it follows that 

\X k (A) - X k (B)\ <0(n- 2 ). 



(4.10) 
(4.11) 

(4.12) 



In particular we have that for large ti if (ipj/(/i(^4), /i(-6)) — ^(1) uniformly in Y and Y. With zi n 
defined above and X a uniform doubly stochastic matrix by Lemma [4.1 1 we have that for any e > 
and large enough n that 

P (d w (jl(n(X - EX)(X - EX)*),Efi(E n )) > 2e \ $(X) G £ n ) 
< P (d w (A(H n ),E/x(5 n )) > e | Y G V rt 



< P (d w (fl(E n ),Efi(E n )) > e) P ( Y G V n 

< n 8n exp (-c'n 2 log" 2 ) = o(l) 



-i 



(4.13) 
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where the final inequality follows from Lemma 14.11 and equation (|4.3p . Now by Theorem [2j 

P eS n )^l 

so 

P (d w (fi(n(X - EX)(X - EX)*),Ep,(E n )) > 2e) -> 
as n — >• oo. As Efi(E n ) — > (j! (see e.g. [SIE]) it follows that 

/x(n(X - - EX)*) -> /x' 

weakly in probability as n — )■ oo. Since the singular values of n l l 2 (X — EX) are the positive square 
roots of the eigenvalues of n l l 2 (X — EX)(X — EX)* and the map x i— >■ x 2 maps /z to // this 
completes the proof of Theorem [3j 
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